Skip to content

test(client): add pathspec coverage for Argo and Step Functions run IDs#3264

Open
dheerenmohta wants to merge 2 commits into
Netflix:masterfrom
dheerenmohta:test/client-pathspec-orchestrator-run-ids
Open

test(client): add pathspec coverage for Argo and Step Functions run IDs#3264
dheerenmohta wants to merge 2 commits into
Netflix:masterfrom
dheerenmohta:test/client-pathspec-orchestrator-run-ids

Conversation

@dheerenmohta

Copy link
Copy Markdown
Contributor

Problem

test_pathspec_validation.py (added in #948) only covers numeric run IDs. This means the critical regression where Run('MyFlow/argo-...') raises MetaflowInvalidPathspec has zero test coverage and would go undetected.

Fix

Add two new test methods:

test_run_valid_orchestrator_run_ids

Verifies that the following formats are accepted as valid run pathspecs:

  • "MovieStatsFlow/argo-moviestatsflow-68z2h" (matches the tutorial example in 08-autopilot)
  • "MyFlow/sfn-my-execution-name" (AWS Step Functions format)

test_task_valid_orchestrator_run_and_task_ids

Same coverage for task-level pathspecs where the run ID is an orchestrator-style string.

Also removes test cases from test_run_invalid_pathspecs that asserted orchestrator-style strings (e.g. "MyFlow/abc") should be rejected — those cases are now intentionally valid.

Companion to PR #1 (fix/client-argo-sfn-run-id-validation).

Relates to #948

Fixes Netflix#948

Previously, pathspec validation only checked the number of components
(e.g., "FlowName/RunID" has 2 parts). This caused confusing errors when
users made typos or used invalid characters, since the error would come
later from the metadata provider rather than at creation time.

This change adds proper format validation:
- Flow/Step/Artifact names must be valid identifiers (start with letter/underscore)
- Run IDs and Task IDs must be numeric
- Empty components are rejected (e.g., "Flow//Step")
- Leading/trailing slashes are rejected

The validation provides clear, actionable error messages that tell users
exactly what's wrong with their pathspec. For example:
  - "Invalid flow name '123Flow'. Flow names must start with a letter..."
  - "Invalid run ID 'abc'. Run IDs must be numeric."

Added comprehensive unit tests covering valid and invalid cases.
The existing test suite only covered numeric run IDs, which meant Bug 1
(non-numeric orchestrator run IDs raising MetaflowInvalidPathspec) had
no regression coverage.

Add test_run_valid_orchestrator_run_ids and
test_task_valid_orchestrator_run_and_task_ids to explicitly exercise the
"argo-*" and "sfn-*" run ID formats that Argo Workflows and AWS Step
Functions produce respectively.

Also tighten test_run_invalid_pathspecs to remove cases that assumed
purely numeric run IDs (e.g. "MyFlow/abc") since those are now valid
orchestrator-style IDs.

Relates to Netflix#948
@greptile-apps

greptile-apps Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR adds per-component pathspec validation to MetaflowObject._validate_pathspec_format and a new test file with coverage for Argo Workflows and AWS Step Functions run IDs. However, the core fix is incomplete: _NUMERIC_ID_PATTERN in core.py still enforces digits-only for run IDs, so the two new orchestrator-ID test methods will fail against the code shipped in this same PR.

  • metaflow/client/core.py: Introduces _validate_pathspec_format that validates each pathspec component by position, but the run-ID branch still uses ^[0-9]+$, rejecting argo-... and sfn-... strings the PR intends to allow.
  • test/unit/test_pathspec_validation.py: Adds 270 lines of unit tests; test_run_valid_orchestrator_run_ids and test_task_valid_orchestrator_run_and_task_ids will raise MetaflowInvalidPathspec (not pass) with the current validator; test_error_messages_are_helpful also hard-codes "numeric" in the error-message assertion, which will need updating once the regex is widened.

Confidence Score: 3/5

The two new orchestrator-ID tests will fail immediately because the run-ID regex in core.py was never updated to accept non-numeric prefixes; merging as-is adds broken tests to the suite.

The added validation method in core.py still uses a digits-only pattern for run IDs, so the very tests this PR introduces to prove the Argo/SFN fix will raise MetaflowInvalidPathspec instead of passing. The companion regex change is absent from core.py, leaving the test file and the production code in direct contradiction.

Both changed files need attention: metaflow/client/core.py needs the run-ID pattern widened to accept orchestrator prefixes, and test/unit/test_pathspec_validation.py needs the "numeric" error-message assertion updated to match the new pattern.

Important Files Changed

Filename Overview
metaflow/client/core.py Adds _validate_pathspec_format static method with per-component regex validation, but _NUMERIC_ID_PATTERN (digits-only) is still used for run IDs, so the orchestrator-ID tests added in this same PR will fail immediately.
test/unit/test_pathspec_validation.py New test file with good coverage, but test_run_valid_orchestrator_run_ids and test_task_valid_orchestrator_run_and_task_ids will fail against the current code; test_error_messages_are_helpful also hard-codes "numeric" in the error-message assertion which conflicts with the intended broader validation.

Reviews (1): Last reviewed commit: "test(client): add pathspec tests for Arg..." | Re-trigger Greptile

Comment thread metaflow/client/core.py
# Flow names and step names should be valid Python identifiers (start with letter/underscore)
_FLOW_NAME_PATTERN = re.compile(r'^[a-zA-Z_][a-zA-Z0-9_]*$')
# Run IDs and Task IDs are numeric
_NUMERIC_ID_PATTERN = re.compile(r'^[0-9]+$')

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Run ID pattern still numeric-only, breaking the new tests

_NUMERIC_ID_PATTERN only matches ^[0-9]+$, so _validate_pathspec_format at lines 347–351 will raise MetaflowInvalidPathspec for every Argo/SFN run ID (e.g. argo-moviestatsflow-68z2h, sfn-my-execution-name). test_run_valid_orchestrator_run_ids and test_task_valid_orchestrator_run_and_task_ids both call the same method and expect those pathspecs to return successfully — those tests will fail against this PR's code as written. The companion fix that widens the run-ID pattern (or introduces a separate _ORCHESTRATOR_RUN_ID_PATTERN) is not present in core.py here.

Comment on lines +254 to +260
# Test invalid run ID
with self.assertRaises(MetaflowInvalidPathspec) as cm:
MetaflowObject._validate_pathspec_format("MyFlow/abc", "run")
error_msg = str(cm.exception)
self.assertIn("abc", error_msg)
self.assertIn("run ID", error_msg)
self.assertIn("numeric", error_msg.lower())

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Error-message assertion contradicts orchestrator-ID intent

This block asserts that "MyFlow/abc" raises with "numeric" in the message. Once the validation is fixed to accept argo- / sfn- prefixed IDs (the missing companion change), the error message "Run IDs must be numeric" will no longer be accurate for the broader class of valid formats, and assertIn("numeric", ...) will create a confusing contract. Consider updating this assertion to match whatever the new error message says (e.g. "must be a valid run ID") at the same time the regex is widened.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Comment on lines +115 to +129
def test_step_invalid_pathspecs(self):
"""Test that invalid step pathspecs are rejected."""
invalid_cases = [
("MyFlow/123", "too few components"),
("MyFlow/123/start/extra", "too many components"),
("MyFlow/abc/start", "non-numeric run ID"),
("MyFlow/123/123step", "step name starts with number"),
("MyFlow/123/my-step", "step name contains dash"),
("MyFlow//start", "empty run ID"),
("MyFlow/123//", "empty step name"),
]
for pathspec, description in invalid_cases:
with self.subTest(pathspec=pathspec, reason=description):
with self.assertRaises(MetaflowInvalidPathspec):
MetaflowObject._validate_pathspec_format(pathspec, "step")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Step-level orchestrator run ID not covered

test_step_invalid_pathspecs keeps ("MyFlow/abc/start", "non-numeric run ID") as a rejection case, but there is no positive test for a step pathspec that contains an orchestrator run ID (e.g. "MyFlow/argo-myflow-abc12/start"). test_task_valid_orchestrator_run_and_task_ids covers the task level, but the step and artifact levels have the same underlying validation and should also get orchestrator-ID positive cases to prevent future regressions.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

@codecov

codecov Bot commented Jun 11, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 94.59459% with 2 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (master@6a634bd). Learn more about missing BASE report.

Files with missing lines Patch % Lines
metaflow/client/core.py 94.59% 0 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff            @@
##             master    #3264   +/-   ##
=========================================
  Coverage          ?   29.04%           
=========================================
  Files             ?      381           
  Lines             ?    52539           
  Branches          ?     9277           
=========================================
  Hits              ?    15260           
  Misses            ?    36252           
  Partials          ?     1027           

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant